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ABSTRACT 

There has been a recent explosion in the size of stored data, 
partially due to advances in storage technology, and partially 
due to the growing popularity of cloud-computing and the 
vast quantities of data generated. This motivates the need 
for streaming algorithms that can compute approximate so- 
lutions without full random access to all of the data. 

We model the problem of loading a graph onto a dis- 
tributed cluster as computing an approximately balanced k- 
partitioning of a graph in a streaming fashion with only one 
pass over the data. We give lower bounds on this problem, 
showing that no algorithm can obtain an o{n) approxima- 
tion with a random or adversarial stream ordering. We ana- 
lyze two variants of a randomized greedy algorithm, one that 
prefers the arg max and one that is proportional, on random 
graphs with embedded balanced /c-cuts and are able to the- 
oretically bound the performance of each algorithms - the 
arg max algorithm is able to recover the embedded fc-cut, 
while, surprisingly, the proportional variant can not. This 
matches the experimental results in [25 J . 

1. INTRODUCTION 

Recent advances in storage technology and distributed 
computing have led to the phenomenon known as big 
data. There are several paradigm shifts involved in the 
big data movement. From a theoretical perspective, one 
is that traditional assumptions like full uniform random 
access to the data are no longer reasonable. This shift 
motivates the study of streaming algorithms. 

One very natural type of 'big data' is graphs. There 
has been a lot of interest in distributed graph com- 
putation systems from many different communities - 
systems builders, database experts, and machine learn- 
ing. This has led to the creation of a huge number of 
such systems including GraphLab 18 , Pregel 
Spark 
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20 , Hor- 



Trinity, and the filtering technique for 
MapReduce to name but a few. In terms of distribut- 
ing data, some of these systems support custom par- 
titionings but the vast majority use a hashing method 
to produce a random cut as the default partitioning. 
From a systems perspective, this approach makes sense 
- it is fast and is easy to maintain. However, the net- 



work is far slower than local communication between 
processor cores. A random cut on a graph is a good 
approximation to the MAXCUT problem and is the ex- 
act opposite of what one should do if one cares about 
communication volumes. Even marginal improvements 
in the partitioning can lead to large improvements in 
run time for distributed algorithms |25] . 

The communication problem is a major motivator of 
the study of graph partitioning. The constraints of dis- 
tributed computing and the fact that the graph data 
arrives as a stream means that traditional graph parti- 
tioning algorithms that assume full access to the data 
are no longer scalable solutions. In this paper, we con- 
sider the problem of finding an approximately balanced 
/c-partitioning of a graph using a streaming algorithm 
with only one pass over the data as this models parti- 
tioning a graph while loading it onto a cluster. 

Previous work addressed this problem from an ex- 
perimental perspective. [25| evaluates 16 different par- 
titioning heuristics on 21 different graphs to find how 
well each performs when compared with an offline par- 
titioning heuristic (METIS T4|). A greedy algorithm 
assigns a vertex to the partition where it currently has 
the most edges. Surprisingly, a simple variant of greedy 
performed the best, even beating an adaptation of a lo- 
cal partitioning algorithm, EvoCut [3]. Also surprising 
was that adding randomization to the same algorithm 
caused it to perform significantly worse. Often, the ad- 
dition of randomness often allows us to design more 
effective algorithms, not less. In this paper, we seek 
to provide a theoretical foundation for understanding 
these results and motivate further study into more so- 
phisticated algorithms. 

Contributions. 

This paper focuses on developing a rigorous under- 
standing of two greedy streaming balanced graph par- 
titioning algorithms. We first give lower bounds on the 
approximation ratio that any streaming algorithm for 
balanced graph partitioning can obtain on both a ran- 
dom and adversarial ordering of the graph. In response 
to this lower bound, we focus our attention on a class 
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of random graphs with embedded balanced k cuts. We 
analyze our greedy algorithms by using a novel coupling 
to finite Polya Urn processes. This is very elucidating 
connection gives clear intuition as to why one algorithm 
performs well while the other does not. We finish with 
an experimental evaluation of the bounds attained by 
the theorems. 

2. RELATED WORK 

Many variants of graph partitioning have been stud- 
ied since the earliest days of Computer Science. The 
variant considered in this paper, balanced fc-cut, has 
been shown to be NP-hard by Andreev and Racke [i], 
even when one relaxes the balance constraint. They 
also give an LP-based solution that obtains an 0(log n) 
approximation. Another full-information solution was 
found by Even et al. who use an LP solution based on 
spreading metrics to also obtain an O(logn) approxi- 
mation algorithm |12| . If one ignores the balance con- 
straint, a popular approach is to use the top k eigen- 
Recently, this approach was theoretically 
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validated as an extension of Cheeger's inequality [l6 
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One can also use any balanced 2-partitioning al- 
gorithm to obtain an approximation to a balanced k- 
partitioning when A: is a power of 2, losing at most an 
additional log n factor [s] . 

From a heuristic perspective, there are numerous full 
information graph partitioning systems available that 
do not have theoretical performance guarantees. These 



include METIS [T4|, PMRSB [Tj, and Chaco [13 



Another approach, relevant for our limited informa- 
tion setting, is local partitioning algorithms. The goal 
here is not to obtain a balanced cut but given a starting 
node to find a good cut around that node. Spielman 
and Teng were the first to develop this style of algo- 
rithm [24] . Anderson, Chung and Lang improved upon 
Spielman and Teng's work by using personalized PageR- 
ank vectors to find a good local cut [2^ . Addressing the 
same problem, Anderson and Peres use the evolving 
graph process to obtain similar results [Sj. While local 
partitioning is similar in spirit, it is not the same as a 
streaming algorithm. 

The main focus of this paper is on streaming algo- 
rithms and there is significant related work in this area 
as well. First, noting the connection between graph 
partitioning and PageRank is Das Sarma et al.'s work 
on computing the PageRank of a graph with multiple 
passes Closer to our setting, Bahmani et al. incre- 
mentally compute an approximation of the PageRank 
vector with only one pass fS]. However, just comput- 
ing the approximate PageRank vector is not sufficient 
for finding a graph partitioning with only one pass over 
the data. Das Sarma et al. extend their techniques to 
find sparse cut projections within subgraphs, again us- 
ing multiple passes over the stream [9]. Cut projections 



are not the same as finding balanced cuts. 

An alternate model, semi-streaming, assumes that we 
have O(npolylogn) storage space so that all vertices 
can be stored but the edges arrive in some order. In 
this setting, Ahn and Guha [l] give a one pass 0(n/e^) 
space algorithm that sparsifies a graph such that each 
cut is approximated to within a (1 -I- e) factor. Kel- 
ner and Levin 15 produce a spectral sparsifier with 
0(n log n/e^) edges in 0(m) time. While sparsifiers are 
a great way of reducing the size of the data, this reduc- 
tion would then require an additional pass over the data 
to compute a partitioning which is out of the scope of 
the problem at hand. Finally, lower bounds are known 
with regards to the space complexity of both the prob- 
lem of finding a minimum and maximum cut. Zelke [27] 
has shown that this cannot be computed in one pass 
with o(n^) space. 

Finally, analyzing algorithms on random graph mod- 
els has a long history. In particular, it is quite common 
to analyze graph partitionings on random graphs with 
planted partitions 21 19 . This is done because re- 



covering a planted partition is equivalent to finding the 
'right' answer. 

3. NOTATION AND DEFINITIONS 

We now introduce the notation and definitions used 
throughout the rest of the paper. The balanced graph 
partitioning problem takes as input a graph G, an in- 
teger k and an allowed imbalance parameter of e. The 
goal is to partition the vertices of G into k sets, each 
no larger than (1 -f- e)^ vertices, while minimizing the 
number of edges cut. 

Graph Models. 

A graph G = (V, E) consists of n — \V\ vertices and 
m = \E\ edges. r('!;) is the set of vertices that a vertex 
V neighbors. We consider graphs generated by two ran- 
dom models. The first, G{n,p) is the traditional Erdos- 
Renyi model with n vertices. The traditional definition 
is that each of the possible (2) edges is included inde- 
pendently with probability p. At certain points in the 
proofs in Section 5, we modify this definition to make 
it better match our streaming model. In particular, we 
allow multiple edges in order to maintain independence 
in our analysis. 

G(\E',P) is a generalization of G{n,p), due to Mc- 
Sherry [2l], that allows the graph to have I different 
Erdos-Renyi components, each with different parame- 
ters. Again, we have n vertices. ^E" : {l,2,...n} — >■ 
{1, 2, . . . is a function mapping the vertices into I dis- 
joint clusters. Let Gi refer to the set of vertices mapped 
to i, i.e. — Gi. P is a. I x I matrix where edges 

between vertices in Gi are included independently with 
probability Pi^i and edges between vertices in Gi and 
Gj are included with probability Pi j. There are many 
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ways for G{^, P) to generate graphs in G{n,p) - could 
map all vertices into the same cluster or we could have 
Pi^i = Pi J = p for all We make the same modifi- 
cation to the generative process as in G{n,p) and allow 
multiple edges for clarity of the analysis. 

Probability Distributions. 

We only use variables drawn from a binomial distri- 
bution, where X ~ B{n,p) is a random variable repre- 
senting n independent trials, each with probability p of 
success. 

3.1 Polya Urn Processes 

The classical Polya Urn problem is: Given finitely 
many initial bins, each containing one ball, let addi- 
tional balls arrive one at a time. For each new ball with 
probability p create a new bin and put the ball in it. 
With probability I — p, place the ball in an existing 
bin with probability proportional to ni^ where m is the 
number of balls currently in that bin. 

Many variants of the above process have been ana- 
lyzed. In particular, Chung, Handjani, and Jungreis [s] 
analyze the finite Polya urn process where p = 0. The 
exponent 7 plays an important role in the behavior of 
this process. With k bins, when 7 < 1, in the limit, 
the load of each bin is uniformly distributed and each 
contains a ^ fraction of the balls. When 7 > 1, in the 
limit, the fractional load of one bin is 1. When 7 = 1, 
the limit of the fractional loads exists but is distributed 
uniformly on the simplex. 

Our proof technique will focus on connecting the stream- 
ing graph partitioning algorithms with the finite Polya 
urn process and use many of the results from \8^. We 
restate the results used here: 

Theorem 1 (Theorem 2.1 from fs]). Consider a 
finite Polya process with exponent "f = I, k bins and let 
x\ denote the fraction of halls in the i^^ bin at time t. 
Then almost surely for each i, the limit Xi = limi_>.oo x\ 
exists. Furthermore these limits are distributed uni- 
formly on the simplex {(^1, X2, . . . Xk) : Xi > 0, Xi ~\- 
X2 + ... + Xk = l}. 

Theorem 2 (Theorem 2.2 from [s]). Consider a 
finite k-bin Polya process with exponent 7 and let x\ de- 
note the fraction of the balls in bin i at time t. Then 
a.s. the limit Xi — limf_>.oo x\ exists for each i. 7/7 > 1 
then Xi = \ for one bin, and Xi = for all others. If 
7 < 1 then Xi = j: for all bins. 

Lemma 1 (Lemma 2.3 from |8|). Given a finite 
or infinite Polya process with exponent 7 and an arbi- 
trary initial configuration (i.e. finitely many balls ar- 
ranged in finitely many bins), suppose we restrict at- 
tention to any particular subset of the bins and ignore 
any balls that are placed in the other bins. Then the 



process behaves exactly like a finite Polya process with 
exponent 7 on this subset of bins, though the process 
may terminate after finitely many balls. 

Lemma [1] is particularly important to our analysis as 
it forms the basis of an inductive argument to extend 
the analysis in [s] to fc bins from 2 bins. We also use the 
claim that a finite, arbitrary initial configuration does 
not affect the distribution in the limit. 

3.2 The Streaming Model 

We consider a streaming graph model where the ver- 
tices arrive in some order. The two stream orderings 
we consider are adversarial and random. For n ver- 
tices, the set of permutations S'„ defines all possible 
orderings. For a random ordering, each permutation is 
picked with equal probability. An adversarial ordering 
is any probability distribution over the permutations, 
including one that picks the worst possible ordering for 
the algorithm. 

When a vertex arrives so do all of its incident edges. 
Our goal is to generate a balanced vertex partition- 
ing of the graph with k partitions. The capacity of 
each partition, C, is enough to hold all the vertices, i.e. 
kC = (1 -f e)n. We assume an undirected graph since 
our evaluation metric, the number of edges cut, is not 
affected by the directionality of an edge. 

We chose this model because we are concerned with 
the problem of loading data onto a cluster and parti- 
tioning at the same time. We assume that only one 
pass can be made over the data and the algorithm has 
access to the current load of each machine on the cluster 
and the location of each vertex that has been previously 
seen. A vertex is not moved after it has been placed into 
some partition. 

4. LOWER BOUNDS 

Given our streaming model, the first important ques- 
tion is whether any algorithm can do well on all graphs. 
The unfortunate answer is no. Intuitively, with only one 
pass, important edges may be hidden either intention- 
ally by an adversary or unintentionally by randomness. 

Theorem 3. One-pass streaming balanced graph par- 
titioning with an adversarial stream order can not be 
approximated within o{n). 

Proof. Without loss of generality, we seek a bal- 
anced 2 partitioning. Consider a graph that is a cycle 
over n vertices with edges such that («, i -\- 1) mod n S 
E ior 1 < i < n. Let the ordering be all odd nodes, then 
all even, i.e. 1, 3, 5 . . . n— 1, 2, 4, 6 ... n. Assume that n is 
even. The optimal balanced partitioning cuts 2 edges. 
However, the given ordering reveals no edges until ^ 
vertices arrive. Until the edges arrive, we have no way 
of distinguishing which vertices are 'near' each other. In 
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Algorithm 1 arg max Greedy 



Algorithm 2 Proportional Greedy 



Input: G, k, C, n 

for i = 1, 2, . . . n do 
for i = 1, 2, . . . fc do 

5, = |r(^(t))np,| 

if \P,\ = C then 
=0 

if all 5, = then 

Pick i from argminjg[fc]{|Pj|} u.a.r. 
else 

Pick i from argmaXjgjj.jjS'j} u.a.r. 
= U 7r(t) 



Input: G, fc, G, vr 

Pi,-- - ,Pfe = 

for i = 1, 2, . . . n do 
for « = 1, 2, . . . fc do 

= |r(^(t))np,| 

if |P,| = G then 
^, = 
if all Si = then 

Pick i from argminjg[fc]{|Pj|} u.a.r. 
else 

Pick i proportional to Si 
P, = P, U 7r{t) 



particular, note that this ordering is indistinguishable 
from one where the odd vertices are given in a random 
order, or one where the odd nodes are interspersed with 
unconnected even nodes, i.e. 1, n — 2, 3, n — 4, 5, n — 6 . . .. 
Thus, no algorithm can do better than cutting | edges 
in expectation. This generalizes to k partitions. □ 

Theorem 4. One-pass streaming balanced graph par- 
titioning with a random stream order can not be approx- 
imated within o{n). 

Proof. Again, we seek a balanced 2 partition for a 
cycle graph with a random ordering. Consider the t*'' 
vertex to arrive in this ordering. 

Pr [t arrives with no edges ] = 
Pr [ both neighbors arrive after t] = n-t-i 



so the number of vertices that we expect to arrive with 
no edges is 



E[# with no edge] : 



En 



L ^+1 

n n—1 



+ 



^{n+l) 



Therefore, asymptotically, we expect ^ vertices to ar- 
rive with no edges. As before, when a vertex arrives 
with no edges, we are not able to determine which other 
vertices it is 'near'. For each of these, we expect to cut 
1 edge, providing us with our lower bound. □ 

In the following sections, we only analyze the algo- 
rithms for random orderings. In particular, we will 
show that for random graphs with higher degree and 
a planted partition, arg max Greedycan recover the par- 
titioning. 

5. ANALYSIS OF ALGORITHMS ON RAN- 
DOM GRAPHS 



The experiments in 25 showed that one heuristic 



studied in the paper. Linear Deterministic Greedy (LDG), 
was clearly the best tried. However, another heuristic. 
Linear Randomized Greedy (LRG), differs only in that it 



selects a partition proportionally to the distribution of 
edges instead of from the maxima. In the experiments, 
LDG performed significantly better than LRG. This 
raises the question - can we theoretically explain the 
difference in performance? In this section, we will in- 
troduce slightly simpler variants, arg max Greedy (corre- 
sponding to LDG) and Proportional Greedy (corresponding 
to LRG), and analyze their performance on McSherry's 
random graph model. Our analysis will clearly demon- 
strate the difference observed in the experiments. 

5.1 Algorithms 

The two algorithms studied in this paper are very 
similar: when a vertex v arrives, a score for each par- 
tition Pi of the number of edges from u to Pi, Si = 
\T{v) n Pi|, is calculated. If the partition is full, its 
score is set to 0. If all scores are 0, then the vertex 
is assigned to some partition with minimal load. If a 
score is non-zero, then the arg max Greedy Algorithm 
assigns the vertex uniformly at random to a partition 
in argmaxS'i. By contrast, the Proportional Greedy Al- 
gorithm uses the scores as a distribution, assigning the 
vertex to partition i with probability Si / ^Sj. 

The versions of these algorithms from J25j differ only 
in that the score for each partition is weighted by the 
current load of the partition, i.e. Si{l — ■^^). In prac- 
tice, the algorithms keep the partitions nearly balanced, 
meaning this tiebreaker is only used in cases of tied 



number of edges and when there are no edges where 25 
prefers the least-loaded partitions. 

One of the key insights of this paper is that when 
these algorithms are used on random graphs, we can 
write both down as random processes. In particular, 
we can let the random process generate the graph while 
also partitioning it at the same time. This reduction 
will be discussed in Section 5.3 The proof proceeds 



by analyzing the random process versions of the algo- 
rithms, rather than those given in Algorithms [l] and |2] 
The random processes generate a multi-edge G(n,p) 
graph. For the extended G(^',P) analysis, we will only 
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Algorithm 3 arg max Greedy Process on G{n,p) 

Input: p 

Set Pi,P2,...-Pfe -0 
for t — 1,2, . . .n do 

For 1 < i < fc, draw £'f^ - B{\P,\,p) 

if ^t^i?f)=Othen 

Assign t to argminjg[fc]{|Pj |} 

else 

Assign t to argmaxj£[j,]{i?j*''} 



consider Algoritlim [T] and tlie correctly modified ver- 
sion of Algorithm |3] The modification is only with the 
generation of the i?!*' and will be discussed later. 

5.2 Result and Proof Outline 

The rest of the paper will focus on proving the follow- 
ing two statements. The first is that the Proportional 
Greedy Algorithm can not recover an embedded parti- 
tion in a G(^, P) graph, no matter what the parameters 
are or how big the graph is. By contrast, the second re- 
sult is that the arg max Greedy Algorithm can recover 
the embedded partition, provided the components are 
dense enough, the cut between them is sparse enough, 
and there are enough components. 

Theorem 5. Let p be the probability of edges within 
components and q be the probability of edges between 
components. Given a G{'^,P) graph with I > klogk 
equally sized components where p > ^j^" , p > 3{k + 

Vk+l)lq, andq^ 0{{k^-^ log ly^), arg max Greedy Al- 
gorithm will recover an embedded partition from a ran- 
dom stream ordering. 

The proof proceeds in several stages. First, we ig- 
nore the capacity constraint and consider Algorithms |3] 
and [4] on a single G{n,p) component. Does the algo- 
rithm eventually learn it is a component and place it 
in the same partition? We show that Algorithm [4] is 
equivalent to a finite Polya urn process with 7 = 1 and 
distributes the component over all the partitions. By 
contrast. Algorithm [3] can be coupled to a finite Polya 
urn process with 7 > 1. It will asymptotically place the 
entire G{n,p) component in one partition. This argu- 
ment starts with 2 partitions and is extended to k bins 
using an induction argument. 

That Algorithmjsjwill correctly (not) partition a con- 
nected component forms the basis of our argument that 
it can be extended to the G(\1/,P) model. Intuitively, 
with the correct parameters, each component of G(5', P) 
will be placed in a single partition. The primary tech- 
nical difficulties faced are the inclusion of the capacity 
constraint, requiring bounds on the component sizes, 
and the addition of intra-cluster edges, which serve to 
'confuse' the algorithm about to which component a 



Algorithm 4 Proportional Greedy Process on G{n,p) 

Input: p 

Set Pi,P2,...Pfe = 
for t — 1,2, . . .n do 

For 1 < i < fc, draw Ef^ - B{\P,\,p) 

if Y^Usf^^Qthen 

Assign t to argminjg[fc]{|Pj |} 

else 

Assign t to Pj with probabihty -^f VEj=i 



vertex belongs. By setting the parameters of the model 
correctly, we can overcome these challenges. 

5.3 Analysis on a Single G{n,p) Component 

We now analyze AlgorithmsJS] and [4] These are ob- 
tained from Algorithms [1] and [2]by considering the pro- 
cess in terms of Polya urns. As a reminder, the finite 
Polya urn process has k bins and the ball is assigned 
to bin i with probability proportional to (mf')'^ where 
nrSp is the load of the bin at time t. 

Translating Algorithm [T] and [2] to Polya Urn pro- 
cesses involves identifying each ball with a vertex and 
each bin with a partition. There are two primary dif- 
ferences from the standard Polya Urn process. First, 
with probability (1 — p)*, the t*^ vertex (ball) does not 
have edges to vertices already seen and it is placed in 
the least loaded partition (urn). The second is that we 
do not assign the vertex (ball) based on the load of the 
partition (urn) but instead on a binomial random vari- 
able based on the load. Specifically, let sf', . . . E^'' be 
the random variables representing the number of edges 
to each of the k partitions. Each is drawn from 
B{mf\p). The following connection is how we created 
Algorithms [3] and |4| 

• Algorithm [T] assigns the vertex to a partition in 
arg maxjg[j.]{i?j*^}, breaking ties at random. 

• Algorithm [2] assigns it to bin i proportional to E'f'^ 

Algorithm [7] Analysis. Consider the total number 
of edges from vertex t as a random variable E^*'' ~ 
B{t,p). Each edge is distributed according to to-*'' i.e. 

with probability it connects to the i*^ bin. Each 
of the ed ges are distributed i.i.d. and are given 
equal weight so Algorithm [2] assigns balls proportional 
to {nrip)'^ where 7=1. 

Theorem 6. Algorithm^ on G{n,p) Let < p < 
1 . Let x\ be the fractional load of partition i at time 
t of Algorithm Then almost surely limt_>.oo x\ = Xj 
exists and for all i, Xi > Q. 
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Proof. We show that when there are edges, this pro- 
cess is exactly a finite Polya urn process with 7=1. 
The resuh then foUows directly from Theorem [l] Let 

(tT 



there be k bins. At time t, each has load m. 



Let 



be the total number of edges drawn by the pro- 



cess 



. Assume i?^*) > as E, 



will be dealt with 
later. Recall that we allow multiple edges in our model, 
so consider the edges being distributed to the k parti- 
tions with replacement, i.e. each of the E^^^^ edges goes 

to partition i with probability j±i- Let E^ be the 
number to partition i. Note that -^i*' — £''•*■'. 

Now Pr [Algorithm |4] picks bin i] = Ef^/E''^\ How- 
ever, Ef^ ^ ^^jy), showing that this assign- 
ment is proportional to mrSp as desired. This is exactly 
a finite Polya urn process with 7 = 1. 

The remaining details concern the modification of the 
process when i?'^'^ = 0. In this case, the algorithm will 
assign the vertex to the least loaded bin. If this situa- 
tion has a constant probability throughout the process, 
then it is making the distribution of the balls more uni- 
form, and satisfy the theorem statement that all bins 
contain a non-zero fraction of the balls. If it is the case 
that this becomes unlikely as the process progresses, i.e. 
p > ^^^^ , then we can apply Theorem [l] and Lemma [l] 
from ^ to say that after 0(^^|^) vertices have arrived, 
we begin the 7 = 1 Polya Urn process with an arbi- 
trary finite initial configuration. From Theorem [l] we 
get that Xi> for alH. □ 

We conclude that the randomized algorithm does not 
have a concentration result. No matter the value of p 
or the size of the graph, for a G{n,p) component, the 
Proportional Greedy algorithm will not learn that it is a 
component and instead distributes it over all partitions. 

Corollary 1. Given a single isolated G{n,p) com- 
ponent, for any value p, Algorithm^ will distribute this 
component over all k partitions. 

Algorithm Analysis. The key insight about why 
Algorithm [3] provides a concentration result is that by 
preferring the arg max of the distribution of edges, once 
some partition has a slightly higher load than the other 
it is very likely to be assigned the next vertex. As the 
gap in the loads grow, the larger partition becomes in- 
creasingly more likely to receive the next vertex until 
it is impossible for the smaller partition to compete. 
However, there are a few challenges. 

The first is that with probability (1 — p)* the t + 
^th vgj-^gx jjot have any edges to previously seen 
vertices. In this case, it is automatically placed in the 
least loaded bin. When this happens, it decreases the 
gap in the loads. If it happens too often, the gap will 



this does not happen with high probability, provided 
p > We only expect ^ vertices to arrive with no 

edges and they are concentrated at the beginning of the 
process when t < 

The second challenge is that when the vertex has 
1 edge, the arg max distribution is the same as Algo- 
rithm |4] However, this can be dealt with in the same 
manner as having no edges. Again, we expect ^ vertices 
to have only 1 edge and primarily when ^ < t < |. 

Therefore, we need p > ^ " . 

The final challenge is that we are not be able to couple 
Algorithm [3] to a finite Polya urn process with 7 > 1 
until - vertices have arrived, meaning we do not start 
with a uniform load distribution. Lemma [l] shows that 
we can start with an arbitrary finite initial configuration 
and obtain the same concentration results. 

Theorem 7. Let p be any value between ^ " and 
1. Let x\ be the fractional load oj partition i at time 
t of Algorithm Then almost surely Ym\t^^ x\ — Xi 
exists and one Xj — I, while all others are 0. 

This statement follows from Theorem[2j Our analysis 
for Algorithm [3] relies on the probability that bin i will 
receive a ball at time t or 



Pr 



Ef^ — argmax{_E-^^} 



for £'!*■' ~ B{m'f\p). It is intuitive that bins with 
a higher load should have a much higher probability of 
being the arg max, yet the binomial distribution does 
not have a nice closed form expression for Pr [X > k]. 
Even if we condition on i?'*) = J2i=i ^l'"* = a; so we 
can express the £:f ^ as a multinomial distribution, a 
nice closed form solution eludes us. 

Therefore, our proof consists of several lemmas. 

Lemma 2. Given a G{n,p) graph with p > ^ " , 
after 0{^-^^^) steps, Algorithm^ with 2 partitions can 
be coupled to a finite Polya urn process with 7 > 1 . 

Proof. Let A = E^*^ and B = e':^^ and A^,B^ be 

the loads conditioned on the fact that E^^^ — j i.e. 

A^ + B^ = j. Let 5 be the comparative advantage of A 

(t) (t) 
over B, i.e. ^ + (5 = and \ — 5 ~ ^^7—- We want to 

analyze Pr [A^ > B^^ 



not grow. Since (1 — p)* w e p*, once t 



0( 



i=Li/2j+l 
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b72j 



.1 ..,.1 



Lj72J 



1=0 



We similarly express Pr [B^ > A^] as follows. 



= (l_5)L.72j+i^ Q (1_^)L.72J-.(1+^). 
Because ^ + (5 > ^ - (5, we have that J2\=o^ (Od + 

5)b72j-.(l _ > ^L^/2J (,^(1 _ ^)b72j-.(l + 5y._ 

Therefore, 

From this, and the fact that these two quantities sum 
to 1, we conclude that 

fi + ^)Lj72J+i 

^ >^ J > (1 +^)b/2j+l + (l _5)b72j+l 

This lower bound is the probability that the ball goes 
in urn 1 in a Polya process with 7 — [j /2\ + 1. When 
j > 2, we can couple our process to a finite Polya urn 
process with a desirable concentration result. We re- 
move the conditioning on E'^*^ — j to get Pr [7l > B] 



Pt[A> B]=Y^ r.V (f - p)*-^Pr [A^ > B^ (1) 
i=i ^-^^ 

The only case where we are mixing in a process that 
has an undesirable exponent (7 = 1) is when j — 
or 1. The probability of this case is less than - when 



t > 



2 log n 
P 



According to Lemma ll this constitutes 
a finite arbitrary configuration and the concentration 

results hold after t > □ 

■p 

The above proof shows that, at some point, the al- 
gorithm can be coupled with a finite Polya urn pro- 
cess with 7 > 1. However, we need Lemma [l] from [s] 
to show that the initial configuration when the pro- 
cess takes off does not affect the concentration results. 
Moreover, we bound the total expected number of ver- 
tices to arrive with j = or 1 by 



f-(f-p)" + f-p 



P P P 

Combining Lemma[T]and[2]shows that for 2 partitions 
Algorithm [3] will concentrate the process into f bin. In 



order to extend the process to k partitions, we present 
the following Lemma. It follows the proof technique of 
Theorem [2] in |8] and utilizes Lemma [T] 

Lemma 3. Consider Algorithm^with k partitions on 
a G{n,p) graph with p > ^ '"^^ " . Let x\ be the fractional 
load of the i^^ partition at time t. Then a.s. the limit 
Xi = lim„ f-yoo x\ exists for each i. For exactly one i, 
X, = I. 

Proof. To extend the analysis of Lemma [2] from 2 
partitions to k, we use induction and condition on each 
pair of bins. Of the k bins, select 2 and call them A and 
B. We modify Lemma [2]s Equation [T] by substituting 

'^\p>{i-py 



Pr 



with 



Pr i?^*-* = j\A or B is in the argmax . 



Given that our coupling to the Polya Urn process is 
unaffected, we just must show that 

Pr = 0, ij > Pr = 0, 1| A or S is in the argmax 



The E'^*'^ — case is simple since 



Pr 



£'(*^ = Q\A or B is the max 







since we only use the argmax process when > 1 
(otherwise we would have assigned the vertex to the 
least loaded partition). When i?^*^ = 1, this is equiva- 
lent to exactly 1 edge being placed and the probability 
that, of the k bins, it selects an endpoint in A or _B is 



exactly 



Thus 



Pr = l\A or B is the max = 



p{i-pY 



< 



p(I = Pr 



= I 



The result now follows from Theorem [2] □ 

Proof of Theorem ^ Combining Lemmas [2] [T] and [3] 
we conclude that Algorithm [3] with k partitions, will 
asymptotically approach a fractional load of f in one 
partition when run with p > 



210S 



□ 



Corollary 2. Given a single G{n,p) component, for 
any value p > ^ " , Algorithm^ will eventually con- 
centrate this component into 1 partition as n —> 00 . 

This analysis leaves open the question of how long 
the process must run before one partition dominates 
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the others. This question has been studied by Drinea, 
Frieze and Mitzenmacher 11 . While they analyze the 



convergence rates for 2 bins, the proofs can be extended 
to k bins via the union bound. In the theorem Bq is 
the name for one of the two bins and all-but-i5 domi- 
nant means that Bq contains at least a 1 — i5 fraction 
of the balls thrown, cq is the initial amount that the 
two bins are separated by after n balls and is a constant 
depending on A, say 



1 

lOOA- 



Theorem 8 (Theorem 2.4 from |11|). Assume 
that we throw balls into the system until Bq is all-but-6 
dominant for some d > 0. Then, if X > 1, with prob- 
ability I — e^^""-*, Bq is all-but-S dominant when the 
system has 2^'^^no balls, where x — log-^ 



id z = loE 



0.1 
<5 ■ 



A-l 
" 5+4(A-l) 



0.4 



Lemma |4] extends this theorem to k bins. 

Lemma 4 (Lemma 4.1 from |11|). Suppose that 
when n balls are thrown into a pair of bins, the probabil- 
ity that neither is all-but-5 dominant is upper-bounded 
byp{n,6). Here, we assume p{n, 5) is non-increasing in 
n. Then when \ + kn/2 balls are thrown into k bins, the 
probability that none is ail-but-^ dominant is at most 
{^p{n,S)forj^S/{6+a-S)/{k-l)) 

To summarize these results on the convergence rate, 
we find that the attachment process starts in earnest 
after ^ vertices have arrived. After | vertices have ar- 
rived, we claim the exponent in the process is greater 
than 1. From Lemma |4] the probability we do not get 
an all-but-e domination is inversely polynomial in the 
number of partitions, 1/e and the number of vertices. 
The bound given by Theorem [8] holds for A = 2 but is 
loose since A value increases every after every round of 

- vertices. 
p 

Comparisons. From these results, we conclude that 
the reason that Algorithm [2] fails to concentrate the 
component is the strict proportionality of its assign- 
ments. If instead it used any exponent greater than 1 
on its scores, i.e. assign to i proportional to 5/, the 
concentration result would hold. In particular, there 
is a huge spectrum of greedy algorithms of the style 
of arg max Greedy and Proportional Greedy. Amongst 
these, arg max Greedy provides the strongest possible 
preference towards concentration. 

5.4 Extending to G(^', P) graphs and capacity 
constraints 

We showed that with no capacity constraints the arg 
max Greedy approach is able to asymptotically place a 
single G{n,p) component into one partition. Specifi- 
cally, while it will initially place vertices in all parti- 
tions, once we begin to see edges, the algorithm concen- 
trates the component into one partition. By contrasts. 



the Proportional Greedy approach always cuts the com- 
ponent into k pieces. We would like to extend this anal- 
ysis for arg max Greedy to graphs that consist of many 
good clusters but face two challenges - the capacity con- 
straints and the 'bad' inter-cluster edges. 

These two challenges motivate our restrictions to both 
^ and P. The capacity constraint can be violated if 
clusters are of size c and the capacity is C and more than 
^ communities chose a specific bin to form their large 
component. From the traditional analysis of throw- 
ing m balls into n bins, we know that the expected 
maximum load (with high probability) is io'°iog n when 
m = n and O(^) when m > nlogn 23 . If we can 
argue that for each cluster the location oTTts large com- 
ponent is chosen uniformly at random from the bins, 
then we can use the balls and bins maximum load anal- 
ysis to argue that if each cluster is small enough, the 
slack required, C ~ (1 + e)^, is also small. We also 
require a small amount of slack in the capacities to ac- 
count for initial mistakes. These mistakes are the result 
of not seeing edges at the beginning of the process. 

For simplicity, our proof will proceed by first assum- 
ing that all of the clusters, Ci, are of the same size 
and that q, the probability of inter-cluster edges, is 0. 
This will allow us to deal with running I finite Polya 
Urn processes simultaneously and independently. After 
this, we show a non-zero bound on q that will bound 
the probability of the process failing to find a cut on the 
inter-cluster edges small. Finally, the assumption that 
the d are of equal size can be relaxed by adjusting the 
parameters in P appropriately. 

Lemma 5. Given a G(5', P) graph where Pi j = 0, 

and Vi, \Pi^i\ > 21ogn/|Ci|,, let a;^*'''*^ be the fraction of 
Ci that partition j holds at time t. With no capacity 
constraints. Theorem^ will guarantee that, as n grows, 
for each cluster i, i/ limt_).oo a;^'^''*'' — Xj'^\ then for 

some j , ATj — 1 while all others are 0. 

Proof. This follows directly from Theorem [7] and 
the fact that when P^ j = 0, the individual components 
can not interact with one another. □ 

Next, we relax the constraint that there are no edges 
between components to obtain a bound that still does 
not necessarily respect capacity constraints. 

Lemma 6. Given a G'(^, P) graph with Pa — p, Pi j - 
q, and all I clusters of equal size \Ci\ = \Cj\ and p > 
^I'^^l" . Let x^*^^*-* be the fraction of Ci that partition 
j holds at time t. With no capacity constraints and k 
partitions, if p > 3(fc + \/lt + l)lq then for each cluster 
i, if limt^oo x^j ^^*^^ — then for some j, xj*-* = 1 

while all others are 0. 

Our goal is to bound the number of 'bad' inter-cluster 
edges away from the number of 'good' intra-cluster edges. 
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We assume worst case distributions so these bounds can 
safely be relaxed in practice. 

Consider component C^. A natural condition is that 
there are more expected intra-cluster edges than inter- 
cluster so p\Ci\ > q{n — \Ci\). We require a few more 
properties. The first is that the inequality holds with 
reasonable probability so p\Ci\ — \/p\Ci\ > q{n— |Ci|) -I- 
\/q{n — \Ci\). The second is that we maintain the sep- 
aration at every step of the execution of the process so 

p|c^»I^-v/?^>9("-|C'*I)^ + \A(^^^K^dJ- Fi- 

nally, we also need that the total number of bad edges 
should be no more than the arg max of the good edges 
as this guarantees that the bad edges will not affect the 
concentration results for each component. This adds a 
factor of k to the bound so we must always guarantee 
there are at least k 'good' edges for each 'bad' edge. 

Proof of Lemma ^ Let the edges from a vertex to its 
own component be 'good' edges and its external edges 
be 'bad'. The separation between the good edges and 
bad edges can be achieved through the use of Chernoff 
bounds. In particular, at time t, we expect that \Ci\^ 
vertices in Ci will have arrived already. Using a Cher- 
noff bound to justify using the expectation, we claim 
that with probability at least 1 — S. Let the next vertex, 
V, be from d . Let i?'*"* be the total number of edges 
from V to the Ci vertices that have already arrived. 



E^'^ >p\a\--J\og{i/6)p\a\-. 

n \l n 

The bad edges, i?', are drawn from B{q, {n— \Ci\)^). 
For clarity, we approximate n — \Ci\ as l\Ci\. Again, 
with probability at least 1 — 5, we claim that 

B' < ql\C.,\- + \ \og{l/5)ql\C^\-. 

n V n 

We set 5 = 1/e to obtain constant probability at least 
1/2. This assumption is supported by the experimental 
results in the next Section. We include bounds that 
hold with high probability in the Appendix. 

To add the constraint that the bad edges are less than 
the argmax{i?j-*-'(j')}, we note that the worst case is 
that all of the bad edges connect to one partition. This 
can happen if the rest of the graph may not be evenly 
distributed over the partitions, or we are observing a 
deviation in the distribution of bad edges. Given this 
it is sufficient that the number of bad edges is bounded 
away from the average number of good edges, so we use 
the condition that 

p\C,\- - Jp\C,\- > k[ql\Q\- + Jql\a\-] 
n \i n n \i n 

To extract meaningful restrictions on p and q from 

this equation, we note that p\Ci\^ — \/p\Ci\^ > k when 



* > ^''+^+'^" . Similarly, ql\C,\^ + ^ql\C,\^ < 1 when 

, (l/2(3^v/5)« ^ n , . (k + Vk+l)n (l/2(3-x/5)n 

^ qi\C,\ ■ vvc nna inai ^i^^i <^ ^^^^^ 
exactly when p > {k + \^+l)lq/ {^{3 — ^/5)). Simplify- 
ing, p > 3{k + ^/{k) + l)lq is sufficient. The gap between 
the left and right hand sides is monotonically increasing 
after this point, guaranteeing that all decisions will be 
made correctly with constant probability. □ 

Provided k > 2, k + \/k + l < 2k so this bound is more 
simply p > 3*2klq = 6klq. If we make stronger assump- 
tions about the distribution of the vertices within the 
bins at any finite time, i.e. that they are approximately 
balanced, then we can drop the (fc + Vk + 1) factor and 
obtain that p > 3lq is sufficient. 

The remaining technical point is the capacity con- 
straints. Given that no aspect of the algorithm is dedi- 
cated towards load balancing when edges exist, our only 
hope can be that the components concentration points 
are distributed uniformly over the partitions. If this 
is the case then a standard balls-and-bins analysis will 
tell us how many components are assigned to each par- 
tition. In particular, if n balls are thrown into n bins, 
we expect the max load to be logn balls. However, if 
nlogn balls are thrown into n bins, we expect the max 
load to be O(logn). With more than nlogn balls, the 
maximum load approaches the average. 

This approach requires that we be able to argue that 
the inter-component edges have no affect on the concen- 
tration location for each component. This is clear when 
(7 = and there are no 'bad' edges, the location of the 
concentration of each component is uniform because of 
the random ordering of the stream. Similarly, if p = 1 
then the component is located exactly where the first 
vertex in the component is placed. 

For other values of p and q we must use a more sophis- 
ticated argument. In particular, we can exploit the gap 
between p and q to argue that many intra-component 
edges are seen before any inter-component edges. If the 
process has run long enough that we can use Lemma|4]to 
argue that for each component, one partition contains 
a bit more than half of the vertices that have arrived, 
then we can argue that the arg max is never changed by 
the presence of 'bad' edges and that the processes do 
not affect each other. 

Lemma 7. Given a G{^, P) graph with P^j = p, Pij = 
q satisfying both Lemma^ and q — 0{{k^'^\ogl)^^), 
with the number of clusters I > k log k and all clusters 
of equal size \Ci\, with high probability the maximum 
load of the partitions is bounded by (l + e)^, where e is 
a function of p, I and k. 

Proof. We first establish that the locations of the 
concentration for each component is uniformly distributed. 
This can be done by arguing that the partition that 
contains the maximum for each component is all-but-5 
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dominant and applying Theorem |8] and Lemma|4]to ob- 
tain that q — 0{{k^ '^\ogl)~^). The exact calculation is 
included in the Appendix. 

Given the components are uniformly distributed over 
the partitions, this is a 'balls-and-bins' process with I 
balls and k bins. If I — ck log k then with high proba- 
bility the maximum load is dc log k where dc is a con- 
stant depending on c 23 . When I » k log k with high 



Fraction of Full Partitions tor 8 partitions, 25 components, p=1 , q=0 



2llogfc. 



probability the maximum load is at most ^ i -v / ^ 
From these results, we conclude that the clusters will be 
nearly evenly distributed amongst the bins. 

Finally, e needs to be set so that the capacity con- 
straints will not be violated by either of the two sources. 
The first is the distribution of vertices before any edges 
appear. This is in expectation ^ vertices, and each par- 
tition will hold ^ of them. The other source of slack 
required is the exact maximum load. This constant 
depends on Z's relationship to k and can be obtained 
ftom [23j. □ 

The number of vertices required for the above argu- 
ment to always hold is quite high in the analysis. 

6. EXPERIMENTAL EVALUATION 

The proofs in the previous sections show that for a 
certain range of parameters and size graph, the algo- 
rithm succeeds in recovering a good partitioning. It 
leaves open some interesting questions that can be ex- 
perimentally evaluated: 

• What is the relationship between e, the load bal- 
ancing factor in Lemma [7] and fc, the number of 
partitions, and Z, the number of components? 

• How tight are the bounds? It is necessary that the 
density of edges within components is p > ^j^q^ 
or that the gap between p and g, the probability 
of edges between components is at least p > Qklql 

• Are the convergence rates tight? For what size 
graph do we begin to recover the partitioning? 

• When we are asymptotically recovering the parti- 
tioning, can we quantify how many mistakes we 
are making, i.e. how many vertices are separated 
from their components at the end of the process? 

These questions are ideals candidate for experimental 
simulation. In fact, experimental results here can lead 
to a much better understanding of the algorithm than 
theoretical worst case bounds. In the following, us- 
ing values that satisfy Lemma [6j we generate G'(^',P) 
graphs and see how well arg max Greedy recovers the 
embedded cut. 

Evaluation. Given a setting of the parameters, we 
generate a random G{'^ , P) graph and run the algo- 
rithm 25 times, each with a different random order- 
ing. After each run, for each component in the G'(^', P) 




Figure 1: Load balancing is not a function of the 
size of the graph 



Fraction of Full Partitions for 8 partitions, p=1 , q=0 
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Load Balancing Factor 

Figure 2: Increasing the number of components 
improves the load balancing. 



graph, we its largest part in the partitioning i.e. if Ci 
is the component, and Pi, P2, ■ ■ ■ Pk the final partition- 
ing, we calculate max^gfc \Ci n Pj|/|Ci|. The theorems 
predicts that for all components, this value approaches 
1 as the graph grows. Note that it can never be worse 
than ^ for k partitions. 

6.1 Load Balancing Factor 

Understanding the load balancing factor required is 
the first step to understanding the other constraints. 
This is because if the load balancing factor is set too 
low, we will see this in the error calculations. To un- 
derstand the slack required, we explore two settings of 
p and q, p — 1 and g = or q = ^ where I, the num- 
ber of components is larger than k log k. Now, for each 
size graph, we run the algorithm 20 times and record the 
number of partitions that hit their capacity constraints. 
We also vary I to understand how its relationship with 
k affects the required slack. 

We include 3 figures to demonstrate the relationship. 
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The first, Figure [T] shows the fraction of full partitions 
when e is allowed to range from 0.01 to 0.5 for graphs of 
size 4,000, 8,000 and 16,000. There is no difference be- 
tween the threshold point in these graphs. The second. 
Figure [2] shows that fixing p, q and k but increasing /, 
the number of components, yields significantly better 
load balancing factors. The third Figure [6] (in the Ap- 
pendix) shows that whether q = or q = 0.002 = p/6kl, 
the load balancing appears the same. 

6.2 Density Requirement 

Lemma [6] requires that each component have edge 
density at least p > ^j^rp. To explore whether this 
is necessary, we can fix values for q, k and I and let p 
range above and below ^|'^^|" . For each run, we measure 
the error from the perfect solution by looking at the 
Euclidean distance between the length-Z vector of the 
values of max^g^ \Ci n -Pj|/|Ci| and the all-ones vector. 



Euclidean Distance for q=0, p-0 to 1 , 4000 vertices, 8 partitions 



Euclidean Distance for p=1 , q=0 to q=0.07, 4000 vertices, 8 partitions 
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Figure 3: For fixed q, k, I values, as p increases, 
the error in the partitioning generated drops to 
0. The vertical bar marks the value required by 
the theorems. 

Though not pictured in Figure |3) the graph size shifts 
the 'elbow' of the graph to the left with a sharper tran- 
sition, matching the bound of the theorem. 

6.3 Constraints on q 

As in the experiments to understand the density fac- 
tor, we can also fix values for p, k and I and let q range 
above and below ^ . Is the factor of k necessary? We 
measure the error by Euclidean distance as above. 

We clearly see the effect that increasing q has on the 
algorithm's ability to recover the partitioning in Fig- 
ure [4] While the value required by the theorems seems 
unnecessarily small (and can only be seen by zooming 
in on this page), dropping the required factor of k and 
using q = 0.02 obtains an average error of only 0.07 over 
25 runs when the maximum error is 7. 




Figure 4: For fixed p, k, I values, as q increases, 
the error in the partitioning increases from 
to maximum error. The leftmost bar at 0.00026 
marks the theorems' requirement, while the sec- 
ond at 0.0021 is g = p/6l. 



6.4 Convergence Rate 

The values given by the Theorems in 



11 about the 



rate of convergence imply a somewhat pessimistic bound 
- q — 0{{k'^ log We can evaluate this bound by 

fixing p, q, k and I and letting the size of the graph 
grow. As it grows, we can measure the Euclidean dis- 
tance to find how quickly it is able to obtain good results 
in terms of recovering the partitioning. 

Convergence of Euclidean Error for 8 partitions, p=0.75, p=p/6kl, 400 to 51 ,200 vertices 



Quartiles t 
Average error - 



I 2 of the number of vertices 



Figure 5: This graph shows that for fixed p, k, I, q 
values, as the size of the graph increases, the 
error in the partitioning generated drops to 0. 

The settings for the algorithm in Figure [5] were p = 
0.75, q = ^i,k = 8,1 = 100. The graph size range from 
400 to 51,200 vertices. We see that as the size of the 
graph increases, the euclidean distance from the optimal 
partitioning solution quickly drops. For 51,200 vertices, 
the median error for 25 runs is only 0.04. This is despite 
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the fact that the theorem required that q < 0.000013 
whereas we used q = 0.00015625. 

7. CONCLUSIONS AND FUTURE WORK 

We have studied two simple greedy algorithms for 
streaming balanced graph partitioning. We first showed 
lower bounds on the possible approximation ratio ob- 
tainable by any algorithm and then analyzed two vari- 
ants of a randomized greedy algorithm on a random 
graph model with embedded balanced fc-cuts. On these 
graphs we were able to explain previous experimental 
results showing that the arg max Greedy algorithm is 
able to recover a good partitioning while the Propor- 
tional Greedy variant is not. Our proof connects the 
greedy algorithms with finite Polya urn processes and 
exploits concentration results about those processes. 

There are several future directions. The first is to im- 
prove the parameters of the analysis. The experiments 
show that the algorithm continues to work with larger 
amounts of noise than that allowed by our theorems. 
The experiments in ^25, show that the algorithm per- 
forms well on other random graph models like Watts- 
Strogatz. Explaining this and proving results about the 
approximation ratio is an interesting question. 

Another direction is that in 25 , additional stream 



orderings were studied. Experimentally, the algorithms 
tested all performed better on both of these orderings 
than the random ordering. An interesting open question 
is to develop techniques for analyzing streaming graph 
algorithms on BFS and DFS orders. 
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APPENDIX 



A. HIGH PROBABILITY BOUNDS FOR 
LEMMA 6 

The experiments justify the assumption that we only 
need the following two statements to hold with constant 
probability: 



< ql\C,\- + J ql\C,\- 
n \i n 

Requiring each to hold with probability 1— 5 increases 
the gap required from p > 3(fc + V~k + l)kql by adding 
a dependency on S. In particular, redoing the calcula- 
tions, we have that 



pICI-- Jlog(l/5)p|a|- >fc 
n V n 

exactly when 

t > ^(A; + log(l/5)/2+ v^fclog(l/<5) + (log(l/(5)V4) 
Similarly, 



ql\C,\- + J\ogil/S)ql\a\- < 1 
n \ n 

exactly when 

i < + log(l/<5)/2 - Vlog(l/5) + (log{l/5)V4) 

Solving these two equations as in Lemma 7 gives us 
a similar relationship that p > f{5)kql. 

B. CALCULATION OF Q FOR LEMMA 7 

In order to prove Lemma[7]we need to understand for 
a given setting of p and q how much interaction between 
the components there is at the i*'* vertex. In particular, 
for the i*'* vertex, we expect that there will hepj edges 
from that vertex to its own component (good edges) and 
q^^—p^ edges to other components (bad edges). Pro- 
vided t < ^(T^Y) , we do not expect any bad edges so the 
components do not interact at all. 

When we do begin to see bad edges, we can appeal 
to Lemma [4j If it is the case that for the given com- 
ponent, one partition contains a 1/2 -f x fraction of the 
component that has arrived to this point, and all other 
partitions split the remaining 1/2 — x fraction then we 
can argue that the bad edges do not affect the concen- 
tration of the process provided the arg max for the good 
edges is not changed by the addition of the bad edges. 
Specifically, we are concerned with t = ^(j^xy > ^ so 
we can find x by solving: 



{l/2+x)p^--^ (1/2 + x)p^- > {{l/2^x)pj + ^ (1/2 - x)p^- 

The above equation gives the distribution of the good 
edges at time t. Substituting that t ~ ^ and there is 
only one bad edge, we need that 

(1/2+^)^-^(1/2 + x)^ > ((1/2-^)^ + ^(1/2 -:r)^ 

This results in 

_ l 2ip/qir-{p/qiy 
V ^p/ql)' 

From this, we can gather that a sufficient 7 value 
required for Lemma [4] is 7 = | ~ -\/l/2(p/(7Z). Lemma [4] 
gives a formula for translating this 7 into a S value for 
Theorem [8j Solving for S we get that 



fc-l-(fc-2)7' 
Plugging in our 7 value, we obtain that 

^ ^ 1/2 - Vl/2{p/ql) 

k-l-{k-2){l/2-^l/2{p/ql)) 

We can simplify this by claiming that (5 < ^ is sufficient. 

The failure probability that we need to obtain from 
Theorem |8] for Lemma |4] is at most to use a union 
bound and still obtain a constant probability of success 
for the whole process. Therefore, we need to set rig = 
Tio + 21ogfc + logL 

From here, we can obtain a number of balls thrown 
before we can obtain this level of concentration. In par- 
ticular, we need 2^+^no balls, where x = log, , x^i — 

and z = log^>^ The x term allows us to obtain up 
to all-but-0.1 dominance, while the second improves the 
result to all-but-(5 dominance. Therefore, if fc < 10, then 
we only need no2^ balls. More generally, substituting 
that eq ~ 1/5A and 6 ~ j:, this value becomes: 

(2A)l/l°S2(5A/(l+4A))(^Q -^^>)l/log2(2A/(A+l))^/^ 

The interesting thing about the process is that as 
more vertices arrive, the A value increases. From this, 
we can immediately claim that this equation dramati- 
cally over-estimates the number of vertices needed be- 
fore 2 bins would obtain a state with ail-but-^ domi- 
nance. In particular, for the p and q values required 
by Lemma [6| we have p = 6klq so A reaches a value of 
3k before we expect to see bad edges. Unfortunately, 
the best we can assume is that A = 2 obtaining the 
following value: 

4^578(0.1/c)^4n^ « 9127ri^(0.1fc)2-4 
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It is certainly possible to set q to l/9127nQ(0.1fc)^''* 
but it is a significantly different bound from p > Qklq. 

C. EXPERIMENTAL RESULTS 



Fraction of Full Partitions for 8 partitions, p=1 , 8000 vertices, 80 components 




1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 

Load baiancing factor 

Figure 6: q does not play a large role in load bal- 
ancing. Note that q = 0.0005 is above the thresh- 
old required by the theorems. 
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